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DETAILED ACTION 

Response to Arguments 

1 . Applicant's arguments filed 02/04/2009 have been fully considered but they are 
not persuasive. 

Argument (page 10): 

Response to argument & corresponding amendment to claims: 

When considered in view of the specification of the present invention, Examiner 
believes that the combined teaching of Marcu in view of Lee appears to teach the 
limitation of claims 1,15, and 30 as amended. Further, the present invention 
describes that "the theory of discourse analysis may include any theory of 
discourse analysis capable of identifying discourse functions in a text" (present 
invention spec. [0021] & Fig. 2, first determining a discourse theory, or any 
theory). Marcu demonstrates this by using a rhetorical structure theory (RST). 
Furthermore, this RST is consistent with the present invention, wherein the 
present invention teaches "marking discourse level structures such as the 
nucleus and satellite distinction described in Rhetorical Structures Theory" 
(present invention spec. [0021]). Marcu clearly demonstrates an RST theory for 
discourse analysis (Marcu [0021]). Marcu goes on to describe nuclei and 
satellite concepts in relation to the RST, wherein the text is thoroughly 
understood ([0077-0078]). 



Application/Control Number: 10/785,199 Page 3 

Art Unit: 2626 

Additionally, given the teaching of Marcu relative to an RST, Lee further 
describes user intervention, wherein a user can change gender, age, and speech 
rate of the synthesized speech (Lee Col. 7 lines 10-17 & Fig. 1). 

Further, Lee teaches the ability to control and adjust prosodic parameters for 
speech synthesis (Previously cited in the last office action Col. 2 lines 29-49 & 
Fig. 1). 

Though, Lee deals with pictures, the multimedia/images/pictures analysis is 
separate from the act of receiving input text, identifying phonemic data, and 
adjusting features for synthesis (i.e. TTS). This concept is well known, and thus 
Lee merely uses a well known concept to improve multimedia synchronization. 
The well known teaching of Lee with respect to receiving input text (word, 
sentence, pattern, etc.), identifying phonemic data, and adjusting features for 
synthesis allows for an improvement to the combined teachings of Shriberg and 
Marcu, wherein a system that analyzes text based on a model of discourse 
analysis can now synthesize speech based on adjusted parameters. This is 
merely an act of text to speech synthesis based on adjusted feature values. 
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Claim Rejections - 35 USC § 103 

2. The following is a quotation of 35 U.S.C. 1 03(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

3. Claims 1-12, 15-26, and 30 are rejected under 35 U.S.C. 103(a) as being 
unpatentable over "Can Prosody Aid the Automatic Classification of Dialog Acts in 
Conversational Speech?" (hereinafter Shriberg) in view of Marcu et al. US 
20020046018 A1 (hereinafter Marcu) and further in view of Lee et al. US 6088673 A 
(hereinafter Lee). 

Re claims 1,15, and 30, Shriberg teaches a method of synthesizing speech 
(Page 5) using discourse function level prosodic features (Pages 14-18) comprising the 
steps of: 

determining discourse functions in the input text the discourse functions being 
determined based on a mapping between basic discourse constituents of the 
determined theory of discourse analysis and a plurality of discourse functions (Pages 8- 
13); 

determining a model of discourse function level prosodic features (Pages 14-18); 

However, Shriberg fails to teach determining a theory of discourse analysis from 
plurality of theories of discourse analysis; 
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Marcu teaches a channel-based summarization process 1700 is to receive the 
input text. Although the embodiment described above uses sentences as the input text, 
any other text segment could be used instead, for example, clauses, paragraphs, or 
entire treatises. Next, in step 1704, the input text is parsed to produce a syntactic tree 
in the style of FIG. 11, which is used in step 1 706 as the basis of generating multiple 
possible solutions (e.g., the shared-forest structure described above). If a whole text is 
given as input, the text can be parsed to produce a discourse tree, and the algorithm 
described here will operate on the discourse tree (Marcu [0220-0221]). 

Further, Marcu teaches a discourse structure for an input text segment (e.g., a 
clause, a sentence, a paragraph or a treatise) is determined by generating a set of one 
or more discourse parsing decision rules based on a training set, and determining a 
discourse structure for the input text segment by applying the generated set of 
discourse parsing decision rules to the input text segment (Marcu [0010]). 

Furthermore, Marcu teaches generating the set of discourse parsing decision 
rules may include iteratively performing one or more operations (e.g., a shift operation 
and one or more different types of reduce operations) on a set of edus to incrementally 
build the annotated text segment associated with the set of edus. The different types of 
reduce operations may include one or more of the following six operations: reduce-ns, 
reduce-sn, reduce-nn, reduce-below-ns, reduce-below-sn, reduce-below-nn. The six 
reduce operations and the shift operation may be sufficient to derive the discourse tree 
of any input text segment (Marcu [0012]). 
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Marcu clearly demonstrates an RST theory for discourse analysis (Marcu [0021]). 
Marcu goes on to describe nuclei and satellite concepts in relation to the RST, wherein 
the text is thoroughly understood ([0077-0078]). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Shriberg to incorporate determining a 
theory of discourse analysis from plurality of theories of discourse analysis as taught by 
Marcu to allow for the proper rules to analyze input text, wherein the type of input 
(phrases, sentences, words, etc.) determine how to determine the structure of text such 
as rhetorical analysis (Marcu [0012]). 

However, Shriberg in view of Marcu fail to teach 
determining input text; 

determining adjusted synthesized speech output based on the discourse 
functions, the model of discourse function level prosodic features (pages 14-18), and 
the input text 

Lee teaches a TTS for interlocking with multimedia according to the present 
invention comprises a multimedia information input unit for organizing text, prosody, the 
information on synchronization with moving picture, lip-shape, and the information such 
as individual property; a data distributor by each media for distributing the information of 
the multimedia information input unit into the information by each media; a language 
processor for converting the text distributed by the data distributor by each media into 
phoneme stream, presuming prosody information and symbolizing the information; a 
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prosody processor for calculating a value of prosody control parameter from the 
symbolized prosody information using a rule and a table; a synchronization adjuster for 
adjusting the duration of the phoneme using the synchronization information distributed 
by the data distributor by each media; a signal processor for producing a synthesized 
speech using the prosody control parameter and data in a synthesis unit database; and 
a picture output apparatus for outputting the picture information distributed by the data 
distributor by each media onto a screen (Lee Col. 2 lines 29-49 & Fig. 1). 

Lee further describes user intervention, wherein a user can change gender, age, 
and speech rate of the synthesized speech (Lee Col. 7 lines 10-17 & Fig. 1). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Shriberg in view of Marcu to incorporate 
determining input text and determining adjusted synthesized speech output based on 
the discourse functions, the model of discourse function level prosodic features, and the 
input text as taught by Lee to allow for the proper rules to analyze input text, wherein 
prosody control is established such as phonemic features of text in order to modify 
output speech synthesis based on input text, and words, sentences, or textual patterns 
in order to adapt in a changing environment (Lee Col. 7 lines 1 0-1 7 & Fig. 1 ). 

Re claims 2 and 16, Shriberg teaches the method of claim 1, wherein the 
discourse functions are determined based on the determined theory of discourse 
analysis (Pages 8-13). 
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Re claims 3 and 17, Shriberg fails to teach the method of claim 2, in which the 
theory of discourse analysis is at least one of: the Linguistic Discourse Model, the 
Unified Linguistic Discourse Model, Rhetorical Structures Theory, Discourse Structure 
Theory and Structured Discourse Representation Theory; 

Re claims 4 and 18, Shriberg teaches the method of claim 1 , wherein the output 
information (Pages 4-5, Why Use Prosody?) is at least one of text information and 
application output information (Pages 8-13). 

Re claims 5 and 1 9, Shriberg teaches the method of claim 1 , wherein 
determining the adjusted synthesized speech output (Pages 4-5, Why Use Prosody?) 
further comprises the steps of: 

determining discourse function level prosodic feature adjustments (Pages 14-18); 

However, Shriberg fails to teach determining input text; 

determining the adjusted synthesized speech output based on the synthesized 
speech output and the discourse level prosodic feature adjustments 

Lee teaches a TTS for interlocking with multimedia according to the present 
invention comprises a multimedia information input unit for organizing text, prosody, the 
information on synchronization with moving picture, lip-shape, and the information such 
as individual property; a data distributor by each media for distributing the information of 
the multimedia information input unit into the information by each media; a language 
processor for converting the text distributed by the data distributor by each media into 
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phoneme stream, presuming prosody information and symbolizing the information; a 
prosody processor for calculating a value of prosody control parameter from the 
symbolized prosody information using a rule and a table; a synchronization adjuster for 
adjusting the duration of the phoneme using the synchronization information distributed 
by the data distributor by each media; a signal processor for producing a synthesized 
speech using the prosody control parameter and data in a synthesis unit database; and 
a picture output apparatus for outputting the picture information distributed by the data 
distributor by each media onto a screen (Lee Col. 2 lines 29-49 & Fig. 1). 

Lee further describes user intervention, wherein a user can change gender, age, 
and speech rate of the synthesized speech (Lee Col. 7 lines 10-17 & Fig. 1). 

Therefore, it would have been obvious to one of ordinary skill in the art at the 
time of the invention to modify the system of Shriberg in view of Marcu to incorporate 
determining input text and determining the adjusted synthesized speech output based 
on the synthesized speech output and the discourse level prosodic feature adjustments 
as taught by Lee to allow for the proper rules to analyze input text, wherein prosody 
control is established such as phonemic features of text in order to modify output 
speech synthesis based on input text, and words, sentences, or textual patterns in order 
to adapt in a changing environment (Lee Col. 7 lines 10-17 & Fig. 1). 

Re claims 6 and 20, Shriberg teaches the method system of claim 1 , wherein the 
model of discourse function level prosodic features (Pages 14-18) is a predictive model 
of discourse functions (Page 19). 
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Re claims 7 and 21 , Shriberg teaches the method of claim 6, in which the 
predictive models are determined based on at least one of: machine learning and rules 
(Page 19). 

Re claims 8 and 22, Shriberg teaches the method of claim 1 , in which the 
prosodic features occur in at least one of a location: preceding, within and following the 
associated discourse function (Page 14). 

Re claims 9 and 23, Shriberg teaches the method of claim 1 , in which the 
prosodic features are encoded within a prosodic feature vector. 

Re claims 10 and 24, Shriberg teaches the method of claim 9, in which the 
prosodic feature vector is a multimodal feature vector (Pages 14-18 & Table 10). 

Re claims 1 1 and 25, Shriberg teaches the method of claim 1 , in which the 
discourse functions include an intra-sentential discourse function (Page 8 & Table 1). 

Re claims 12 and 26, Shriberg teaches the method of claim 1, in which the 
discourse functions include an inter-sentential discourse function (Page 8 & Table 1). 
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Conclusion 

4. THIS ACTION IS MADE FINAL. Applicant is reminded of the extension of time 
policy as set forth in 37 CFR 1 .136(a). 

A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Michael C. Colucci whose telephone number is (571)- 
270-1847. The examiner can normally be reached on 9:30 am - 6:00 pm, Monday- 
Friday. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Richemond Dorvil can be reached on (571)-272-7602. The fax phone 
number for the organization where this application or proceeding is assigned is 571- 
273-8300. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a 
USPTO Customer Service Representative or access to the automated information 
system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000. 
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